reasoning program
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (9 more...)
- Europe > France > Auvergne-Rhône-Alpes > Lyon > Lyon (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (11 more...)
BOOST: Bootstrapping Strategy-Driven Reasoning Programs for Program-Guided Fact-Checking
Hu, Qisheng, Long, Quanyu, Wang, Wenya
Large language model pipelines have improved automated fact-checking for complex claims, yet many approaches rely on few-shot in-context learning with demonstrations that require substantial human effort and domain expertise. Among these, program-guided reasoning, by decomposing claims into function calls and executing reasoning programs, which has shown particular promise, but remains limited by the need for manually crafted demonstrations. Fundamentally, the underlying principles of effective reasoning program generation still remain underexplored. In this work, we introduce BOOST, a bootstrapping approach for automated few-shot reasoning program generation. BOOST iteratively refines explicit, data-driven guidelines as meta-rules for guiding demonstration creation, using a critique-refine loop that eliminates the need for human intervention. This enables a seamless transition from zero-shot to few-shot program-guided learning, enhancing interpretability and effectiveness. Experimental results show that BOOST outperforms prior few-shot baselines in both zero-shot and few-shot settings for complex claim verification.
Efficiently Serving LLM Reasoning Programs with Certaindex
Fu, Yichao, Chen, Junda, Zhu, Siqi, Fu, Zheyu, Dai, Zhongdongming, Qiao, Aurick, Zhang, Hao
The rapid evolution of large language models (LLMs) has unlocked their capabilities in advanced reasoning tasks like mathematical problem-solving, code generation, and legal analysis. Central to this progress are inference-time reasoning algorithms, which refine outputs by exploring multiple solution paths, at the cost of increasing compute demands and response latencies. Existing serving systems fail to adapt to the scaling behaviors of these algorithms or the varying difficulty of queries, leading to inefficient resource use and unmet latency targets. We present Dynasor, a system that optimizes inference-time compute for LLM reasoning queries. Unlike traditional engines, Dynasor tracks and schedules requests within reasoning queries and uses Certaindex, a proxy that measures statistical reasoning progress based on model certainty, to guide compute allocation dynamically. Dynasor co-adapts scheduling with reasoning progress: it allocates more compute to hard queries, reduces compute for simpler ones, and terminates unpromising queries early, balancing accuracy, latency, and cost. On diverse datasets and algorithms, Dynasor reduces compute by up to 50% in batch processing and sustaining 3.3x higher query rates or 4.7x tighter latency SLOs in online serving.
Fact-Checking Complex Claims with Program-Guided Reasoning
Pan, Liangming, Wu, Xiaobao, Lu, Xinyuan, Luu, Anh Tuan, Wang, William Yang, Kan, Min-Yen, Nakov, Preslav
Fact-checking real-world claims often requires collecting multiple pieces of evidence and applying complex multi-step reasoning. In this paper, we present Program-Guided Fact-Checking (ProgramFC), a novel fact-checking model that decomposes complex claims into simpler sub-tasks that can be solved using a shared library of specialized functions. We first leverage the in-context learning ability of large language models to generate reasoning programs to guide the verification process. Afterward, we execute the program by delegating each sub-task to the corresponding sub-task handler. This process makes our model both explanatory and data-efficient, providing clear explanations of its reasoning process and requiring minimal training data. We evaluate ProgramFC on two challenging fact-checking datasets and show that it outperforms seven fact-checking baselines across different settings of evidence availability, with explicit output programs that benefit human debugging. Our codes and data are publicly available at https://github.com/mbzuai-nlp/ProgramFC.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (30 more...)
- Workflow (0.93)
- Research Report (0.82)
- Health & Medicine (0.93)
- Leisure & Entertainment > Sports > Motorsports (0.93)
- Media > Film (0.68)
- Leisure & Entertainment > Sports > Hockey (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.67)
ELASTIC: Numerical Reasoning with Adaptive Symbolic Compiler
Zhang, Jiaxin, Moshfeghi, Yashar
Numerical reasoning over text is a challenging task of Artificial Intelligence (AI), requiring reading comprehension and numerical reasoning abilities. Previous approaches use numerical reasoning programs to represent the reasoning process. However, most works do not separate the generation of operators and operands, which are key components of a numerical reasoning program, thus limiting their ability to generate such programs for complicated tasks. In this paper, we introduce the numEricaL reASoning with adapTive symbolIc Compiler (ELASTIC) model, which is constituted of the RoBERTa as the Encoder and a Compiler with four modules: Reasoning Manager, Operator Generator, Operands Generator, and Memory Register. ELASTIC is robust when conducting complicated reasoning. Also, it is domain agnostic by supporting the expansion of diverse operators without caring about the number of operands it contains. Experiments show that ELASTIC achieves 68.96 and 65.21 of execution accuracy and program accuracy on the FinQA dataset and 83.00 program accuracy on the MathQA dataset, outperforming previous state-of-the-art models significantly.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (11 more...)
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering
Chen, Zhiyu, Li, Shiyang, Smiley, Charese, Ma, Zhiqiang, Shah, Sameena, Wang, William Yang
With the recent advance in large pre-trained language models, researchers have achieved record performances in NLP tasks that mostly focus on language pattern matching. The community is experiencing the shift of the challenge from how to model language to the imitation of complex reasoning abilities like human beings. In this work, we investigate the application domain of finance that involves real-world, complex numerical reasoning. We propose a new large-scale dataset, ConvFinQA, aiming to study the chain of numerical reasoning in conversational question answering. Our dataset poses great challenge in modeling long-range, complex numerical reasoning paths in real-world conversations. We conduct comprehensive experiments and analyses with both the neural symbolic methods and the prompting-based methods, to provide insights into the reasoning mechanisms of these two divisions. We believe our new dataset should serve as a valuable resource to push forward the exploration of real-world, complex reasoning tasks as the next research focus. Our dataset and code is publicly available at https://github.com/czyssrs/ConvFinQA.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.86)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)
A Robustly Optimized Long Text to Math Models for Numerical Reasoning On FinQA
Zhang, Renhui, Zhang, Youwei, Yu, Yao
Numerical reasoning is required when solving most problems in our life, but it has been neglected in previous artificial intelligence researches. FinQA challenge has been organized to strengthen the study on numerical reasoning where the participants are asked to predict the numerical reasoning program to solve financial question. The result of FinQA will be evaluated by both execution accuracy and program accuracy. In this paper, we present our approach to tackle the task objective by developing models with different specialized capabilities and fusing their strength. Overall, our approach achieves the 1st place in FinQA challenge, with 71.93% execution accuracy and 67.03% program accuracy.
A Curious New Result of Resolution Strategies in Negation-Limited Inverters Problem
Ando, Ruo, Takefuji, Yoshiyasu
Generally, negation-limited inverters problem is known as a puzzle of constructing an inverter with AND gates and OR gates and a few inverters. In this paper, we introduce a curious new result about the effectiveness of two powerful ATP (Automated Theorem Proving) strategies on tackling negation limited inverter problem. Two resolution strategies are UR (Unit Resulting) resolution and hyper-resolution. In experiment, we come two kinds of automated circuit construction: 3 input/output inverters and 4 input/output BCD Counter Circuit. Both circuits are constructed with a few limited inverters. Curiously, it has been turned out that UR resolution is drastically faster than hyper-resolution in the measurement of the size of SOS (Set of Support). Besides, we discuss the syntactic and semantic criteria which might causes considerable difference of computation cost between UR resolution and hyper-resolution.
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- North America > Mexico > Quintana Roo > Cancún (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)